Goto

Collaborating Authors

 label quality score


ObjectLab: Automated Diagnosis of Mislabeled Images in Object Detection Data

Tkachenko, Ulyana, Thyagarajan, Aditya, Mueller, Jonas

arXiv.org Artificial Intelligence

Such Swapped errors are also common vehicles, object detection remains fairly in many classification datasets (Northcutt et al., 2021a), brittle in part due to annotation errors that plague but the increased complexity of object detection annotation most real-world training datasets. We propose introduces potential for more varied types of label errors ObjectLab, a straightforward algorithm to detect than encountered in classification. We propose an algorithm, diverse errors in object detection labels, including: ObjectLab, that utilizes any trained object detection model overlooked bounding boxes, badly located boxes, to estimate the incorrect labels in such a dataset, regardless and incorrect class label assignments. Object-which of these 3 types of mistake the data annotators made. Lab utilizes any trained object detection model to score the label quality of each image, such that Training and evaluating models with incorrect bounding box mislabeled images can be automatically prioritized annotations is clearly worrisome.


Estimating label quality and errors in semantic segmentation data via any model

Lad, Vedang, Mueller, Jonas

arXiv.org Artificial Intelligence

The labor-intensive annotation process of semantic segmentation datasets is often prone to errors, since humans struggle to label every pixel correctly. We study algorithms to automatically detect such annotation errors, in particular methods to score label quality, such that the images with the lowest scores are least likely to be correctly labeled. This helps prioritize what data to review in order to ensure a high-quality training/evaluation dataset, which is critical in sensitive applications such as medical imaging and autonomous vehicles. Widely applicable, our label quality scores rely on probabilistic predictions from a trained segmentation model -- any model architecture and training procedure can be utilized. Here we study 7 different label quality scoring methods used in conjunction with a DeepLabV3+ or a FPN segmentation model to detect annotation errors in a version of the SYNTHIA dataset. Precision-recall evaluations reveal a score -- the soft-minimum of the model-estimated likelihoods of each pixel's annotated class -- that is particularly effective to identify images that are mislabeled, across multiple types of annotation error.


Identifying Incorrect Annotations in Multi-Label Classification Data

Thyagarajan, Aditya, Snorrason, Elías, Northcutt, Curtis, Mueller, Jonas

arXiv.org Artificial Intelligence

In multi-label classification, each example in a dataset may be annotated as belonging to one or more classes (or none of the classes). Example applications include image (or document) tagging where each possible tag either applies to a particular image (or document) or not. With many possible classes to consider, data annotators are likely to make errors when labeling such data in practice. Here we consider algorithms for finding mislabeled examples in multi-label classification datasets. We propose an extension of the Confident Learning framework to this setting, as well as a label quality score that ranks examples with label errors much higher than those which are correctly labeled. Both approaches can utilize any trained classifier. After demonstrating that our methodology empirically outperforms other algorithms for label error detection, we apply our approach to discover many label errors in the CelebA image tagging dataset.


Detecting Label Errors in Token Classification Data

Wang, Wei-Chen, Mueller, Jonas

arXiv.org Artificial Intelligence

Mislabeled examples are a common issue in real-world data, particularly for tasks like token classification where many labels must be chosen on a fine-grained basis. Here we consider the task of finding sentences that contain label errors in token classification datasets. We study 11 different straightforward methods that score tokens/sentences based on the predicted class probabilities output by a (any) token classification model (trained via any procedure). In precision-recall evaluations based on real-world label errors in entity recognition data from CoNLL-2003, we identify a simple and effective method that consistently detects those sentences containing label errors when applied with different token classification models.